Belief tracking and action selection in spoken dialog systems

ABSTRACT

An action is performed in a spoken dialog system in response to a user&#39;s spoken utterance. A policy which maps belief states of user intent to actions is retrieved or created. A belief state is determined based on the spoken utterance, and an action is selected based on the determined belief state and the policy. The action is performed, and in one embodiment, involves requesting clarification of the spoken utterance from the user. Creating a policy may involve simulating user inputs and spoken dialog system interactions, and modifying policy parameters iteratively until a policy threshold is satisfied. In one embodiment, a belief state is determined by converting the spoken utterance into text, assigning the text to one or more dialog slots associated with nodes in a probabilistic ontology tree (POT), and determining a joint probability based on probability distribution tables in the POT and on the dialog slot assignments.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.61/378,303, filed Aug. 30, 2010 the content of which is incorporated byreference herein in its entirety.

FIELD OF THE INVENTION

The exemplary embodiments relate to the field of human-machine dialogsystems and more particularly to the implementation of a belief trackingdialog system with a parameterized policy for action selection.

BACKGROUND OF THE INVENTION

A central function of human-machine dialog systems is the tracking andestimation of a user's intentions based on spoken utterances (commonlyreferred to as “belief tracking”). Belief tracking systems may gatherinformation over several turns of interaction between the systems and auser. The user's dialog may be converted into text using automaticspeech recognition (ASR), and the converted text may be processed by anatural language understanding system which extracts the meaning of thetext and passes it to a belief tracking system. Given the potential forerror in speech recognition and understanding due to noisy observationsand other forms of interference, probabilistic approaches to belieftracking are desirable. These probabilistic approaches may utilizeBayesian networks that represent the system's belief as the jointprobability space of concepts while leveraging conditional independenceamong them. These approaches also typically constrain the domain, forexample to air travel, in order to improve performance.

The design of a domain-specific Bayesian network requires significanteffort and expert knowledge that is not always readily available.Further, real-world systems often yield large networks on whichinference is intractable without major assumptions and approximations.Mitigating the computation requirements of determining the jointdistribution of user intentions is typically performed by assuming fullconditional independence between concepts. However, such an assumptionviolates existing dependencies between concepts and results ininaccurate joint distributions.

Belief tracking systems are often implemented within or in conjunctionwith spoken dialog systems capable of performing various actions basedon a user's determined intentions. For example, a belief tracking systemimplemented in a mobile phone may receive the spoken command “callBill”, and may automatically dial a contact's phone number in advance.Likewise, if the mobile phone belief tracking system does not understandwhat the user wants, the system may prompt the user to repeat thecommand. Such systems utilize an action selection policy (hereinafter,“policy”) which maps, in one embodiment, a set of actions to beperformed to a distribution of belief states of user intentions.Alternatively, a policy may map a set of actions to be performed toother conversation attributes, such as the emotional state of thespeaker, the vocabulary of the speaker, the semantic content of theconversation, or any other attribute related to the conversation or tothe speaker.

Multi-turn spoken dialog systems may utilize partially observable Markovdecision processes (POMDPs) to track user intentions over a series ofturns and to decide which actions to take based on the tracked userintentions. POMDPs maintain and rely on a probability distribution overa set of possible belief states based on previous observations, andupdate the probability distribution based on each subsequentobservations. At each turn, a POMDP performs an action based on thebelief state probability distribution according to an action selectionpolicy. In an goal-oriented environment, such as a dialog system inwhich a user is seeking a particular action to be performed, a POMDPattempts to optimize the minimum number of turns required to perform theaction by rewarding correct actions taken and penalizing incorrectactions taken.

An action selection policy may include parameters that dictate theaction performed according to a belief state distribution. For example,such a policy may include belief state thresholds, and a dialog systemmay perform a first action if a particular threshold is met by thebelief state and a second action if the threshold is not met by thebelief state. The policy parameters may be manually discretized, butdoing so is time consuming, prone to errors and often ignores the localstructure among high dimensional domain concepts.

SUMMARY OF THE INVENTION

The selection and performance of an action in response to receiving aspoken utterance in a spoken dialog system is described. A policy whichincludes a mapping of belief states of user intent to actions to beperformed is retrieved or created. In one embodiment, the policy may mapbelief state features to actions. The belief state of user intent isdetermined based on the spoken utterance. An action is selected based onthe determined belief state and the policy. If the selected actionrequires additional feedback from the user, the additional feedback isrequested and the process is repeated. If the action does not requireadditional feedback from the user, the action is performed.

Creating a policy may include iteratively determining which actions toperform in response to particular belief states using a POMDP. Initialpolicy parameters are determined by manual selection, based onpreviously created policies, by random selection, or based on any othersuitable criteria. A simulated user input associated with apre-determined goal is created and the belief state of the simulateduser input is determined. In one embodiment, the simulated user input isintentionally obscured by noise in order to best simulate theenvironment in which an actual user utterance is received and whereother audio noise may be present. An action is performed based on thedetermined belief state and based on the policy parameters. If the goalis not achieved by the performance of the action, the simulated userinput is modified, the belief state is re-determined, and an action isre-performed. If the goal is achieved by the performance of the action,a POMDP reward score is determined for the iterative user input andaction performance process. If a policy threshold is not satisfied, thepolicy parameters are modified and this process is continued until thepolicy threshold is satisfied.

Determining a belief state of user intent may include the use of aprobabilistic ontology tree (“POT”). In one embodiment, a POT is createdbased on an ontology, either pre-existing or modified for the purpose ofcreating a POT. A probability distribution table is determined for eachontology node, and a Bayesian network is created based on the ontologyand the determined probability tables. The built or retrieved POTincludes a plurality of unobserved nodes, each node including anassociated probability distribution table. The spoken utterance isconverted into text using, for example, automatic speech recognition,and the converted text is assigned to one or more dialog slotsassociated with POT nodes. Observed nodes are created in POT based onthe assigned slots, and probability distribution tables are created forthe observed nodes based on the slot assignment confidence and thedomain of the observed nodes. A belief state is determined based on thejoint probability of the probability distribution tables of the POT'sunobserved nodes and observed nodes.

The features and advantages described in the specification are not allinclusive and, in particular, many additional features and advantageswill be apparent to one of ordinary skill in the art in view of thedrawings and specification. Moreover, it should be noted that thelanguage used in the specification has been principally selected forreadability and instructional purposes, and may not have been selectedto delineate or circumscribe the inventive subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an environment for implementing belief tracking andpolicy parameterization in a spoken dialog system in accordance with oneembodiment.

FIG. 2 illustrates a belief tracking module for using probabilisticontology trees to determine a joint probability distribution across userintent belief states based on user dialog observations in accordancewith one embodiment.

FIG. 3 illustrates the policy module for determining an action toperform based on a determination of a user's intent in accordance withone embodiment.

FIG. 4 illustrates an example probabilistic ontology tree in accordancewith one embodiment.

FIG. 5 is a flowchart illustrating the determination of user intentthrough belief tracking in accordance with one embodiment.

FIG. 6 is a flowchart illustrating the creation of a probabilisticontology tree in accordance with one embodiment.

FIG. 7 is a flowchart illustrating the performance of an action by aspoken dialog system in response to a vocal query in accordance to oneembodiment.

FIG. 8 is a flowchart illustrating the creation of a policy inaccordance to one embodiment.

The figures depict various embodiments for purposes of illustrationonly. One skilled in the art will readily recognize from the followingdiscussion that alternative embodiments of the structures and methodsillustrated herein may be employed without departing from the principlesdescribed herein.

DETAILED DESCRIPTION

Spoken Dialog System Overview

FIG. 1 illustrates an environment for implementing belief tracking andpolicy parameterization in a spoken dialog system (“SDS”) in accordancewith one embodiment. The SDS 100 may be implemented as a standalonesystem, may be communicatively coupled to another system, or may beimplemented as a component within a larger system. In one embodiment,the SDS 100 may be implemented in a vehicle. In this embodiment, a usermay interact with the SDS 100 to access information about the vehicle;access information about stores, restaurants, services, or otherentities; receive driving directions or other navigation instructions;make phone calls; retrieve and play media; or take any other functionrelated to a vehicle system. Alternatively, the SDS 100 may beimplemented within a mobile phone, a computer, a website, or any othersystem which utilizes belief tracking and policy parameterization.

The SDS 100 includes a microphone 110, a belief tracking module 120, apolicy module 130, and an interface module 140. In alternativeembodiments, additional or fewer components may be present, or thefunctionality of two or more components may be combined (for instance,the belief tracking module 120 and the policy module 130 may beimplemented in a single module). The microphone 110 is configured toreceive spoken utterances from a user. Spoken utterances includecommands, requests, queries, clarifications, system navigation requests,or any sound from a user in a form. The microphone 110 records spokenutterances for processing and analysis by the SDS 100.

The belief tracking module 120 determines the intent of a user withregards to the SDS 100 based on observations of the spoken utterances ofthe user. To determine the intent of a user, the belief tracking module120 maintains and updates a probability distribution over a set ofconcepts in one or more domains using one or more POTs. As will bediscussed below, when a user speaks an utterance, the belief trackingmodule 120 converts the utterance to text and then extracts observationsfrom the text with a certain degree of confidence. The belief trackingmodule 120 then computes a joint probability distribution of the mostlikely user intent hypotheses to determine the belief state of the userintent.

The policy module 130 stores a policy which maps a set of actions whichthe SDS 100 can perform or can request another system to perform to aset of user intent belief states. In one embodiment, the policy module130 uses the belief tracking module 120 to determine the intent of auser, and requests an action to be performed from an interface module140 in response to the determined intent of the user and based on thestored policy. The policy module 130 may optionally include trainingfunctionality to create and/or modify a policy. As will be discussedbelow, the policy module 130 may utilize a POMDP decision maker toadjust policy parameters automatically in training in order to optimizethe policy implemented by the policy module 130.

The interface module 140 includes the functionality to interface withthe SDS 100 or an external system in order to perform an actionrequested by the policy module 130. In one embodiment, the interfacemodule 140 performs actions related to the SDS 100. For example, theinterface module 140 may request clarification from a user, may confirma user utterance, or may inform a user of information requested by auser. Alternatively, the interface module 140 may turn on or turn offthe SDS 100, may restart a user interaction, or may cancel aninteraction. The interface module 140 may retrieve information requestedby a user from, for example, the internet, a database, or local storage.The interface module 140 may interface with a vehicle to perform vehiclefunctions, may interface with a phone to perform phone functions, mayinterface with a navigation system to perform navigation functions, ormay interface with any other system communicatively coupled to the SDS100.

In an example use case, the SDS 100 receives a query from a user via themicrophone 110. The belief tracking module 120 converts the utterance totext, and, as will be discussed below, assigns the text to one or moredialog slots. Based on the text assigned to slots, the confidence of theconversion from the utterance to text, and the confidence of the slotassignment, the belief tracking module 120 determines a jointprobability distribution for the user intent belief state. Based on thisjoint probability distribution, the policy module 130 determines anaction to perform, and the interface module 140 performs this action.

FIG. 2 illustrates the belief tracking module 120 for using POTs todetermine a joint probability distribution across user intent beliefstates based on user dialog observations in accordance with oneembodiment. The belief tracking module 120 as illustrated includes acomputer processor 200 and a memory 210. Note that in other embodiments,the belief tracking module 120 may include additional features otherthan those illustrated in FIG. 2. FIG. 3 illustrates the policy module130 for determining an action to perform based on a determination of auser's intent in accordance with one embodiment. The policy module 130as illustrated includes a computer processor 300 and a memory 310. Notethat in other embodiments, the policy module 130 may include additionalfeatures other than those illustrated in FIG. 3.

In one embodiment, the belief tracking module 120 and the policy module130 are implemented with separate processors and memories.Alternatively, the belief tracking module 120 and the policy module 130may be implemented with the same processor and memory. For example, themodules of the memory 210 and the modules of the memory 310 may beimplemented in a single memory. In one embodiment, the belief trackingmodule 120 is implemented in the belief tracking module 330. For thesake of simplicity, the belief tracking module 120 and the policy module130 are discussed separately for the remainder of this description.

The processor 200 and the processor 300 process data signals and mayinclude various computing architectures including a complex instructionset computer (CISC) architecture, a reduced instruction set computer(RISC) architecture, or an architecture implementing a combination ofinstruction sets. Although only a single processor is shown in each ofFIG. 2 and FIG. 3, multiple processors may be included. The processors200 and 300 may include arithmetic logic units, microprocessors, generalpurpose computers, or some other information appliances equipped totransmit, receive and process electronic data signals from and betweeneach other and the memories 210 and 310, the microphone 110, and theinterface 140.

The memories 210 and 310 stores instructions and/or data that may beexecuted by processors 200 and 300, respectively. The instructionsand/or data may comprise code (i.e., modules) for performing any and/orall of the techniques described herein. The memories 210 and 310 may beany non-transitory computer-readable storage medium such as dynamicrandom access memory (DRAM) device, a static random access memory (SRAM)device, Flash RAM (non-volatile storage), combinations of the above, orsome other memory device known in the art.

In the embodiment of FIG. 2, the memory 210 includes a POT module 220, asemantic understanding module 230, an observation module 240, and ajoint probability module 250. In the embodiment of FIG. 3, the memory310 includes a training module 320, a belief tracking module 330, apolicy interface 340, and a system action module 350. Note that in otherembodiments, the memories may include additional or fewer modules toperform the functionalities described herein. The modules stored withinthe memory 210 are adapted to communicate with each other and theprocessor 200. The modules stored within the memory 310 are adapted tocommunicate with each other and the processor 300. The modules ofmemories 210 and 310 are also adapted to communicate with the microphone110, the interface module 140, the SDS 100 and any external systemscommunicatively coupled to the SDS 100.

Belief Tracking

The belief tracking module 120 determines a joint probabilitydistribution of a belief state of user intent. The joint probabilitydistribution determined by the belief tracking module 120 is based on auser utterance and a POT stored by the POT module 220. A POT is atree-structure Bayesian network that extends a domain ontology byspecifying probability distributions for Bayesian network node values.The POT module 220 may store any number of POTs. FIG. 4 illustrates anexample POT in accordance with one embodiment. For example, the POT ofFIG. 4 may be implemented in a vehicle navigation system. Each POT noderepresents a particular concept or sub-concept within a domain. The POTof FIG. 4 includes an action node 400, a venue node 402, a command node404, and so forth. The action node 400 represents the possible actions auser may take, namely finding a venue (associated with the venue node402) and issuing an SDS command (associated with the command module404). It should be noted that although probability tables are onlyillustrated for some nodes, each node has an associated probabilitytable. Further, it should be emphasized that a typical POT may haveseveral hundred or more nodes.

There are two types of POT nodes: specialization nodes (which representIS-A relationships) and composition nodes (which represent HAS-Arelationships). If two nodes have an IS-A relationship, the first of thenodes represents a concept which is a subset of the concept representedby the second of the nodes. For example, the venue node 402 representsthe possible venues a user may search, namely a restaurant (associatedwith the restaurant node 406), a store (associated with the store node408), and a service (associated with the service node 410). In thisexample, the restaurant node 406, the store node 408, and the servicenode 410, which represent subsets of the concept “venue”, arespecialization nodes. If two nodes have a HAS-A relationship, a first ofthe nodes represents a concept which is a property of the conceptrepresented by the second of the nodes. For example, the venue node 402also represents properties of the possible venues, such as the street avenue is located on (associated with the street node 412). In thisexample, the street node 412 is a composition node, and is a child nodeof the venue node 402.

In one embodiment, specialization nodes have exactly one parent, and theset of subconcepts associated with a set of specialization nodes with acommon parent node represent are disjoint. In the embodiment of FIG. 4,the action node 400, the venue node 402, the command node 404, therestaurant node 406, the store node 408, the service node 410, the startnode 414, and the cancel node 416 are specialization nodes. In oneembodiment, a composition node may have one or more parent nodes, butmay only have more than one parent node if all of the parent nodes arespecialization nodes and in turn have a common parent node. In theembodiment of FIG. 4, the street node 412, the cuisine node 418, thehours node 420, the price node 422, the storetype node 424, and theservicetype node 426 are composition nodes.

POTs include nodes representing concepts independent of a user utterance(“unobserved nodes”) and nodes representing concepts observed in a userutterance (“observed nodes”). Unobserved nodes exist for each domainconcept and subconcept represented by the POT. Observed nodes exist fordomain concepts and subconcepts which the belief tracking module 120 hasdetermined are associated with a user's utterances. In operation, thebelief tracking module 120 retrieves a POT from the POT module 220, andadds observed nodes to the retrieved POT based on a user's utterances.Observed nodes are discussed below with regards to the observationmodule 240.

Each POT node has an associated probability distribution. The POT rootnode has a probability distribution representing the probability thatthe concept represented by the unobserved node takes a possible value.In the example of FIG. 4, the action node 400 has an associatedprobability distribution indicating a 60% chance that the action node400 takes the value “venue” and a 40% chance that the action node takesthe value “command”. In one embodiment, each POT has exactly one rootnode.

Non-root unobserved nodes have a probability distribution representingthe probability that the concept represented by the unobserved nodetakes a possible value given the value of the unobserved node's parentnode. Further, the probability distributions of non-root unobservednodes include a null element for the cases where the unobserved nodeconcept is inapplicable given the value of the parent node. In theexample of FIG. 4, if the venue node 402 takes the value “restaurant”,the store node 408 has a 40% chance of taking the value “Bob's” and a60% chance of taking the value “null”. Likewise, if the venue node 402takes the value “store”, the store node 408 has a 30% chance of takingthe value “Al's”, a 20% chance of taking the value “Bob's”, and a 50%chance of taking the value “Carl's”. Finally, if the value of the venuenode 402 is “service”, the store node 408 has a 100% chance of takingthe value “null”. These probability distributions indicate that, forexample, there are three possible stores (Al's, Bob's, and Carl's), thatBob's is both a restaurant and a store, and that neither Al's, Bob's,nor Carl's provide a service relevant to the service node 410.

Observed nodes have an associated probability distribution representingthe likelihood that a user spoke an utterance associated with anintended node and an intended node value, and that the SDS 100 correctlyand incorrectly identifies the intended node value. In addition, theprobability distribution associated with observed nodes represent thelikelihood that a user spoke an utterance associated with an unintendednode and an intended node value, and that the SDS correctly andincorrectly identifies the intended node value. The probabilitydistribution associated with observed nodes are discussed below ingreater detail with regards to the observation node 240.

The POTs stored by the POT module 220 may be created in advance by theSDS 100. POTs are built based on one or more ontologies. In oneembodiment, a POT is built based on an established ontology.Alternatively, an ontology may be created, modified or customized forthe purposes of creating a POT. Once an ontology is selected, a Bayesiannetwork is created based on the ontology and with the same graphicalstructure as the ontology. A probability distribution table is thencreated for each node of the Bayesian network.

The entries of the probability distribution tables may be based onexpert knowledge, learned data, or any other suitable means. Nullsemantics are applied to the probability distributions in order toensure that inconsistent instantiations of the ontology have aprobability of 0 in the POT. As discussed above, the entries of theprobability distribution table for specialization nodes may beconditional upon the value of the specialization node's parent node.Likewise, the entries of the probability distribution table forcomposition nodes may be conditional upon the value of one or more ofthe composition node's parent nodes. In one embodiment, if the value ofthe parent node is null, the probability that the value of the childnode is null is 1. If the value of a composition node is required toexist for a non-null value of the composition node's parent node, thenthe composition node is an essential node. In one embodiment, if theparent node of an essential node has a non-null value, the probabilitythat the essential node has a value of null is 0.

A POT is retrieved from the POT module 220 in response to receiving aspoken utterance via the microphone 110. Each node of the POT isassociated with a dialog slot. The semantic understanding module 230uses ASR to convert the spoken utterance into text. A word confidencescore may be assigned to each word converted from utterance into textbased on the confidence that ASR converted the utterance into textcorrectly. The semantic understand module 230 then assigns the text todialog slots based on the word confidence scores and grammar rules. Aslot confidence score may be assigned to the assigned dialog slots basedon the degree of certainty in the observation. The same word or wordstext may be assigned to multiple dialog slots, and each text-to-dialogslot assignment is assigned a slot confidence score.

The semantic understanding module 230 generates one or more hypothesesbased on the dialog slots to which text is assigned. In one embodiment,a hypothesis is generated for each unique combination of assignments oftext to dialog slots such that each text word is assigned to no morethan one dialog slot at a time. A hypothesis confidence score isassigned to each hypothesis and may be based on how many dialog slotsare matched, the word and slot confidence scores, the dependencies andrelatedness between the assigned slots, or any other suitable factor.

In one embodiment, the semantic understanding module 230 selects thehypothesis with the highest hypothesis confidence score. The semanticunderstanding module 230 then identifies the nodes associated with thedialog slots of the determined hypothesis. The observation module 240creates observed nodes in the POT retrieved from the POT module 220based on the identified nodes, and creates probability distributiontables for each of the observed nodes. Each observed node is created asa child node of the associated unobserved node in the POT. In theexample of FIG. 4, a hypothesis contains dialog slot assignmentsassociated with the price node 422 and the storetype node 424. As aresult, the price node 430 is created as a child node to the price node422 and a storetype node 432 is created as a child node to the storetypenode 424, where both the price node 430 and the storetype node 432 areobserved nodes. The probability distribution table entries for observednodes may be based on the confidence score of the selected hypothesis,based on the potential values of POT nodes, and based on the value of aPOT node intended by a user in the spoken utterance.

In one embodiment, the probability distribution table entries arecomputed for an observed node using the following equations:

$\begin{matrix}{{\Pr\left( {\hat{X} = {\left. x \middle| X \right. = x}} \right)} = \frac{1 + \frac{c\left( {{{D(X)}} - 1} \right)}{100}}{{D(X)}}} & (1) \\{{\Pr\left( {\left. {\hat{X} \neq x} \middle| X \right. = x} \right)} = \frac{1 - \frac{c}{100}}{{D(X)}}} & (2) \\{{\Pr\left( {\hat{X} = \left. x \middle| {X \neq x} \right.} \right)} = {1 - {ɛ\left( {{{D(X)}} - 1} \right)}}} & (3) \\{{\Pr\left( {\hat{X} \neq x} \middle| {X \neq x} \right)} = ɛ} & (4)\end{matrix}$In equations 1-4, X is a random variable representing the user-intendedvalues of the observed node, {circumflex over (X)} is a random variablerepresenting the observed semantic understanding of the observed node, xis the observed value for the observation node, c represents the slotconfidence score for the dialog slot associated with the observationnode with values in the interval [0, 100], and ε represents a non-zeroerror margin, such as ε=10⁻¹⁰.

In one embodiment, a user may be given the option to confirm or reject ahypothesis. In this embodiment, the observation module 240 may notcreate observation nodes until the user confirms the hypothesis, or maycreate observation nodes but may not add the observation nodes to thePOT until the user confirms the hypothesis. Likewise, the observationmodule 240 may discard the hypothesis, or may remove the observationnodes associated with the hypothesis from the POT if the user rejectsthe hypothesis.

Once a hypothesis is selected and observation nodes are added to theretrieved POT, the joint probability module 250 computes a jointprobability distribution for combinations of POT node values using theprobability distribution tables associated with each POT node,conditioned on the observed variables. The combination of POT nodevalues that results in the most likely joint probability is output bythe joint probability module 250 as the most probable explanation ofuser intent. The joint probability module 250 may compute the jointprobability for every combination of POT node values, but this requiresextremely large amounts of storage space, processing power, and time.These requirements are directly affected by the number of nodes in theretrieved POT and the number of possible values per node.

To reduce the requirements of exhaustively determining the jointprobability distribution for all combinations of node values, the jointprobability module 250 utilizes a message-passing algorithm to determinethe most probable explanation of user intent. The message-passingalgorithm determines node-level joint probabilities in reversetopological order, starting at the leaf nodes of the POT. Once the jointprobability distribution is determined for a particular node, the top mjoint probabilities are passed in message form to the node's parentnode. It should be noted that the design of the POT ensures that them-best joint probabilities are consistent across specializations, andthat one and only one specialization node value is applicable per nodein any joint probability.

The message-passing algorithm requires a node to receive a message fromall children nodes of the node before the node can in turn send amessage to the node's parent node. The message-passing algorithm definesthe messages passed from leaf nodes to parent nodes as the identityvector. For a non-leaf node, the message passed to a parent node is avector that represents the multiplication of the top m jointprobabilities received from a child node message weighted by the childnode's probability distribution table values for all child nodes, foreach combination of descendant nodes (child nodes, grandchild nodes, andso on) represented by the top m joint probabilities received from childnode messages. Once these joint probabilities are computed, the non-leafnode passes the top m joint probabilities to the node's parent node.

In one embodiment, the message-passing algorithm is expressed as:

if X is a leaf node, then ψ_(X)(x) ← 1, ∀x ε D(X) return ψ_(X) end iffor x ε D(X) do for {right arrow over (z)} = ((y₁, {right arrow over(z₁)}), . . ., (y_(k), {right arrow over (z_(k))})) ε {D(ψ_(Y) ₁ ) × . .. × D(ψ_(Y) _(k) )} do$\left. {\psi_{X}^{\prime}\left( {x,\overset{\rightharpoonup}{z}} \right)}\leftarrow{\underset{i}{\Pi}\left\lbrack {{\Pr\left( {Y_{i} = {\left. y_{i} \middle| X \right. = x}} \right)} \times {\psi_{Y_{i}}\left( {y_{i},\overset{\rightharpoonup}{z_{l}}} \right)}} \right\rbrack} \right.$end for ψ_(X)(x) ← top m elements of ψ_(X) ^(′)(x) end for return ψ_(X)In this algorithm, X is a random variable which represents a node forwhich a message is computed, ψ_(x) represents the message passed by thenode, {right arrow over (z)} represents the values of descendant nodesassociated with the top m joint probabilities received by the node, yrepresents the children nodes of the node, and ψ_(y) represents messagesreceived from the children nodes. By limiting the probabilitydistributions passed in message form to a parent node to the top m jointprobabilities, less likely joint probabilities are discarded early,reducing the computational complexity required to maintain allpermutations of node values.

The message-passing algorithm is iterated for successively higher levelsof the POT hierarchy until the top m joint probabilities are determinedfor the POT's root node. The root node's highest joint probability isassociated with a set of node values that represents the most probableuser intent. The joint probability module 250 outputs this set of nodevalues as the most probable user intent hypothesis. In the example ofFIG. 4, if the highest joint probability at the action node 400 isassociated with the values {store, Al's, $$$, . . . }, then thehypothesis output by the joint probability module 250 representing thebelief state of the user's intent in dialog slot form would be(venue=store, store=Al's, price=$$$) and so forth. In one embodiment,the joint probability module 250 outputs more than one hypothesis, andthe user selects which (if any) hypothesis is correct.

Policy Parameterization

The policy module 130 determines an action to perform based on adetermined belief state and a stored policy, and performs the action.The policy module 130 also may contain the functionality to train apolicy for later use. As discussed above, a policy is a function whichmaps a set of actions to a set of belief states of user intentions. Forexample, a policy may map an uncertain or unconfident belief state of auser's intentions to a “request more information” action. In oneembodiment, the set of actions which a policy maps to belief states islimited to requesting clarification from a user, confirming one or moredialog slots, or informing a user. In an alternative embodiment, the setof actions may include additional actions, as discussed above.

The training module 320 creates a policy based on simulated userinteractions for which the user intended user action, or the goal, isknown ahead of time. A simulated user interaction includes a series ofuser-machine turns, wherein each turn includes receiving a simulateduser utterance, determining the belief state of the user's intent basedon the utterance, and performing an action based on the belief state.The simulated user utterances may be intentionally obscured by noise inorder to best simulate receiving actual user utterances through amicrophone in an environment where other audio noise may be present. Thetraining module 320 utilizes a POMDP decision-maker to update policyparameters by maintaining a reward score for each simulated userinteraction. As discussed above, in a goal-oriented environment, a POMDPattempts to minimize the number of turns required to perform auser-intended action through the use of a reward/penalty system. Eachtime an action other than the intended action is performed, a penaltymay be applied to the reward score. For instance, the reward score maybe reduced by 1. In order to minimize the number of turns required toperform a user-intended action, the reward scores are optimized byiteratively modifying policy parameters, re-simulating the userinteractions, and selecting the policy parameters which result in thehighest reward scores.

The training module 320 first retrieves an initial policy. The initialpolicy includes initial policy parameters which may be manually selectedby a system operator, may be selected based on previous trainedpolicies, may be randomly selected, or may be selected based on anyother criteria. The policy may be stochastically represented by theprobability distribution π_(θ)(a_(t)/b_(t)), parameterized by the vectorθ, wherein a_(t) represents an action to be taken given the belief stateb_(t). The discounted long-term expected reward score may be representedby the equation:

$\begin{matrix}{{J(\theta)} = {E\left\lbrack {\sum\limits_{t = 0}^{\infty}{\gamma^{t}{r\left( {g,a_{t}} \right)}}} \right\rbrack}} & (5)\end{matrix}$In equation 5, E represents the expectation operator, γ represents adiscount-factor in the interval (0, 1) which reduces future rewardsbased on the number of turns until the reward, r represents the rewardscore to be awarded, and g represents the goal or intended action.

The expectation function of equation 5 integrates over all possiblegoals and all sequences of actions and observations. To determine theoptimal policy parameters, the training module 320 attempts to find theparameters which optimize the value of the reward score expectationfunction J(θ). It should be noted that because the optimization policyused herein requires a stochastic policy, the policy ire isprobabilistic. Further, because the policy is stochastic, the policyhelps the optimization process avoid becoming locked in to suboptimallocal maxima.

In order to optimize the policy parameters, the gradient ∇J(θ) of thereward score expectation function is determined. In one embodiment, thenatural gradient {tilde over (∇)}J(θ) is determined in place of thegradient ∇J(θ). The natural gradient of the reward score expectationfunction may be determined using the equation:{tilde over (∇)}J(θ)=G ₀ ⁻¹ ∇J(θ)  (6)In equation 6, G_(θ) represents the Fisher-information matrix, whichtakes into account the curvature of the parameter space. The naturalgradient {tilde over (∇)}J(θ) represents the steepest direction ofascent with respect to a corresponding Riemannian metric. In the POMDPsetting, the gradient can be computed using the episodic NaturalActor-Critic (eNAC) algorithm. The eNAC algorithm approximates thegradient using sample trajectories from the POMDP model. Moreinformation about the natural gradient can be found in S. Amari's“Natural gradient works efficiently in learning” in Neural Computation,Vol. 10, No. 2, pages 251-276, which is incorporated by referenceherein. More information on the eNAC algorithm can be found in J.Peters' and S. Schaal's “Natural actor-critic” in Neurocomputing, Vol.71, Iss. 7-9, pages 1180-1190, which is incorporated by referenceherein.

In one embodiment, the eNAC algorithm computes the gradient by solvingfor ω in the equation:

$\begin{matrix}{{{\sum\limits_{t = 0}^{T}{\gamma^{t}{\nabla\log}\;{\pi_{\theta}\left( a_{t} \middle| b_{t} \right)}^{Tr}\omega}} + {V_{\theta}\left( b_{0} \right)}} = {\sum\limits_{t = 0}^{T}{\gamma^{t}r_{t}}}} & (7)\end{matrix}$In equation 7, V_(θ)(b₀) is the value of the belief-state b₀, which hasan expected value of J(θ), T represents the number of sampledtrajectories sampled over successive turns during training, and Trrepresents the transposition operator. Equation 7 is essentially aregression problem, and can be solved for ω as long as the number ofsampled trajectories is greater than the number of parameters in θ. Onceequation 7 is solved for ω, the training module 320 uses ω as anapproximation of the natural gradient, i.e. ω≈{tilde over (∇)}J(θ).

The gradient points in the direction of the steepest slope on theparameter space. Once the gradient is determined, the training module320 takes a step in this direction by adjusting the policy parameters asindicated by the gradient and may iteratively repeat this process untila convergence metric is satisfied. In one embodiment, the convergencemetric is satisfied when a particular reward score is consistentlyachieved during simulation. In an alternative embodiment, theconvergence metric is satisfied when the change of parametersconsistently fails to exceed a pre-determined threshold. Alternatively,instead of satisfying a convergence metric, the training module 320 maysimulate user interactions for a threshold number of iterations. Afterthe training module 320 satisfies a convergence metric or an iterationthreshold, the policy parameters are stored as an action-belief policy.

The training module 320 adjusts the policy parameters as indicated bythe computed gradient so that a particular action is selected based on aparticular belief state. It should be noted that the belief state isrepresented by a vector in a high-dimensional continuous space.Accordingly, determining the belief state values to map to a particularaction in such a space may require a large amount of computing power,time and simulation iterations. To reduce these requirements, thetraining module 320 may instead map belief state features to actions.Belief state features are represented by φ, and may include all theinformation provided by the user in past turns. In other embodiments,parameterization can also be extended to other features, such asconcepts provided in the last turn by the user, features that expressconfidence such as ratio of likelihood of the top two beliefs, or anyother suitable feature.

The training module 320 may map belief features to a probabilitydistribution over actions, where π_(θ)(a|φ) represents the probabilityof taking action a given the belief features φ. It should be noted thatbelief features φ map the belief state to R^(N), where R represents thereal number domain, and where N represents the number of features. Inone embodiment, a general policy is created to select between D possibleactions. A Gibbs policy may be constructed to select from these actions,and may be represented by the equation:

$\begin{matrix}{{\pi_{\theta}\left( a \middle| \varphi \right)} = \frac{{\mathbb{e}}^{{\varphi{\lbrack M_{\theta}\rbrack}}_{a}}}{\sum\limits_{a^{\prime}}{{\mathbb{e}}^{\varphi}\left\lbrack M_{\theta} \right\rbrack}_{a^{\prime}}}} & (8)\end{matrix}$In equation 8, M_(θ) represents a matrix such that M_(θ) ε R^(D×N), and[M_(θ]) _(a) represents the ath row of the matrix M_(θ).

In one embodiment, for any parameterization of M_(θ), the log-policygradient can be written as:

$\begin{matrix}{{{\nabla\log}\;{\pi_{\theta}\left( a \middle| \varphi \right)}} = {{\nabla\left( {\left\lbrack M_{\theta} \right\rbrack_{a}\varphi} \right)} - {\sum\limits_{a^{\prime}}{{\pi_{\theta}\left( a^{\prime} \middle| \varphi \right)}{\nabla\left( {\left\lbrack M_{\theta} \right\rbrack_{a^{\prime}}\varphi} \right)}}}}} & (9)\end{matrix}$The structure of M_(θ) may take into account the local structure ofdialog slots. In one embodiment, we assume a dialog system with n slots,a set of n local actions which each only act on a single slot (e.g. arequest action), a single global action which acts on all slots (e.g. aninform action), and a particular slot-based structure for the belieffeatures defined by the equation:φ=[φ₀; . . . ; φ_(n-1)]ε R^(nm)  (10)In equation 10, m represents the number of values/slot (the number oflocal features for each slot). Note that each slot contributes localfeatures φ_(i)εR^(m), that each φ vector is m-dimensional, and that eachφ vector entry is associated with a particular slot or a global action.

Continuing with this embodiment, it should be noted that parameters maybe exhaustively learned, but this would require learning on the order ofn²m parameters. Alternatively, the slot-based structure of the belieffeatures can be leveraged to reduce the number of parameters which mustbe learned. In this embodiment, each set of local features φ_(i)independently votes for an action for the slot associated with the setof local features. In addition, in this embodiment, all sets of localfeatures vote for the global action. This system of voting may berepresented by the equation:

$\begin{matrix}{M_{\theta} = {\left\lbrack \frac{\begin{matrix}{\hat{\theta}}_{0}^{Tr} & \; & \; \\\; & \ddots & \; \\\; & \; & {\hat{\theta}}_{n - 1}^{Tr}\end{matrix}}{\begin{matrix}{\overset{\_}{\theta}}_{0}^{Tr} & \ldots & {\overset{\_}{\theta}}_{n - 1}^{Tr}\end{matrix}} \right\rbrack \in R^{{({n + 1})} \times n\; m}}} & (11)\end{matrix}$In equation 11, {circumflex over (θ)}_(i) and θ _(i), correspond to thecontributions the ith slot makes to local and global actions,respectively. Note that each sub-vector must be the same size as eachset of local features, resulting in a total of 2 nm parameters.

In this embodiment, for a parameter vector constructed by stacking thecomponents from equation 11, namely [{circumflex over (θ)}₀; . . . ;{circumflex over (θ)}_(n-1); θ ₀; . . . ; θ _(n-1)], the gradient fromequation 9 can be rewritten as:∇([M _(θ)]_(a)φ)=[0_(am); φ_(a); 0_((2n-a-1)m)] for a<n  (12)and∇([M _(θ)]_(a)φ)=[0_(nm); φ] for a=n  (13)

In one embodiment, parameter values may be shared. For example, the samelocal parameters may be shared across all slots, e.g. {circumflex over(θ)}_(i)={circumflex over (θ)}. Assuming M_(Lθ) represents the matrix Mwith shared local parameters, the gradient from equation 9 can berewritten as:∇([M _(Lθ)]_(a)φ)=[φ_(a); 0_(nm)] for a<n  (14)∇([M _(Lθ)]_(a)φ)=[0_(m); φ] for a=n  (15)

Likewise, global parameters may be shared across slots, e.g. θ _(i)= θ.Assuming M_(Gθ) represents the matrix M with shared global parameters,the gradient from equation 9 can be rewritten as:∇([M _(Gθ)]_(a)φ)=[0_(am); φ_(a); 0_(m(n-a))] for a<n  (16)∇([M _(Gθ)]_(a)φ)=[0_(nm); φ₀+ . . . +φ_(n-1)] for a=n  (17)

In one embodiment, both local parameters and global parameters areshared. Assuming M_(Aθ) represents the matrix M with shared local andglobal parameters, the gradient from equation 9 can be rewritten as:∇([M _(Aθ]) _(a)φ)=[φ_(m); 0_(m)] for a<n  (18)∇([M _(Aθ)]_(a)φ)=[0_(m); φ₀+ . . . +φ_(n-1)] for a=n  (19)

Once the training module 320 develops policy parameters, the policy maybe stored in the policy module 130, for instance in the training module320 or the policy interface 340. The policy module 130 receives a spokenutterance through the microphone 110, and the belief tracking module 330determines a belief state of user intent based on the spoken utterance.As discussed above, the belief tracking module 330 may implement thefunctionality of the belief tracking module 120, or may interface withthe belief tracking module 120. Alternatively, the belief trackingmodule 330 may determine a belief state of user intent by any othersuitable means. In one embodiment, the belief tracking module 330determines a set of belief features φ based on the belief state of userintent.

The policy interface 340 retrieves a policy mapping a belief state orbelief features to an action and retrieves a belief state or belieffeatures from the belief tracking module 330. The policy interfacemodule 340 applies the retrieved policy to the user's spoken utteranceby voting for each potential action in a set of actions based on theretrieved policy parameters and the belief state or belief features ofthe user utterance. The distribution of votes over the set of actionsmay be represented by a probability distribution for the set of actions.

In one embodiment, the policy interface 340 utilizes the policyparameter matrix M_(θ) and set of belief features φ described above, andvotes for each action a in a set of potential actions with weights givenby the ath row of M_(θ), e.g. [M_(θ)]_(a)φ. In this embodiment, aprobability distribution may be computed for the set of actions bytaking a normalized exponential of these votes.

In one embodiment, the policy interface 340 outputs the votes for theset of actions to the system action module 350. Alternatively, thepolicy interface 340 may output a probability distribution for the setof actions to the system action module 350. The system action module 350selects an action to perform based on the output receive from the policyinterface 340. In one embodiment, the system action module 350 selectsthe most voted for action or the most probable action to perform.

The system action module 350 may perform an action using the interfacemodule 140. In the embodiment where the complete set of actions thesystem action module 350 can perform consists of requestingclarification from a user, confirming a user utterance, or informing auser of information requested by a user, the system action module 350may use the speech capabilities of the interface module 140 to performthe selected action. For example, if the selected action includesrequesting clarification of one or more dialog slots from a user, theinterface module 140 may use audio signals to prompt the user to clarifythe one or more dialog slots, or may use image signals on a display toprompt the user to clarify the one or more slots. Likewise, if theselected action includes informing the user of information requested bythe user, the interface module 140 may use audio signals to inform theuser of the information requested by the user, or may use image signalson a display to present the requested information to the user. Ifclarification or confirmation are requested from the user, the SDS 100may receive an additional spoken utterance from the user, and the policymodule 130 may determine a new action to take based on the additionalspoken utterance. This cycle may iteratively repeat until the policymodule 130 selects an inform action.

The system action module 350 may select an action outside of the contextof requesting clarification, confirming a request, or informing a user.In one embodiment, the SDS is implemented in a vehicle, and the systemaction module 350 may retrieve information from a local source (such asa vehicle's computer or other vehicle systems) or an external source(such as over the internet using a wireless communication technologylike WiFi, 3G, 4G, LTE, WiMax or any other wireless protocol). In oneembodiment, the system action module 350 uses a user's or a vehicle'sphone system to make a call as requested by the user. The system actionmodule 350 may also be capable of implementing vehicle functions, suchas turning on/off headlights and/or windshield wipers, turning on/off oradjusting the radio or media system, selecting a piece of media contentto play, turning on/off or interfacing with a navigation system, or anyother vehicle function.

Spoken Dialog System Operation

FIG. 5 is a flowchart illustrating the determination of user intentthrough belief tracking in accordance with one embodiment. The processof determining user intent through belief tracking may optionallyinclude building 500 one or more POTs. Alternatively, a pre-existing POTmay be retrieved. The built or retrieved POT includes a plurality ofunobserved nodes, each node including an associated probabilitydistribution table. A spoken utterance is received 510 from a user. Thespoken utterance is converted 520 into text using automatic speechrecognition or any other technique capable of converting spoken languageinto text.

The converted text is assigned 530 to one or more dialog slotsassociated with POT nodes using semantic understanding. Semanticunderstanding may account for the confidence that a spoken utterance wasconverted into text correctly and the confidence that the text isrelated to the one or more dialog slots. In one embodiment, theconverted text is assigned to multiple combinations of dialog slots, andthe combination of dialog slots with the highest assignment confidenceis selected. Observed nodes are created 540 in the POT based on theassigned slots. Probability distribution tables are created for theobserved nodes based on the slot assignment confidence and the domain ofthe observed nodes. A belief state is determined 550 based on the jointprobability of the probability distribution tables of the POT'sunobserved and observed nodes.

FIG. 6 is a flowchart illustrating the creation of a probabilisticontology tree in accordance with one embodiment. An ontology isretrieved 600. In one embodiment, an ontology may be created, or aretrieved ontology may be modified prior to the creation of a POT. Aprobability distribution table is determined 610 for each ontology nodesubject to node type constraints, null semantic constraints, and nodeessentiality constraints. A Bayesian network is created 620 based on theontology and the determined probability tables.

FIG. 7 is a flowchart illustrating the performance of an action by aspoken dialog system in response to a vocal query in accordance to oneembodiment. A policy is optionally created 700 which maps belief statesto actions using a POMDP. In one embodiment, a policy created in advanceis retrieved. The policy may map belief state features to actions in oneembodiment. A spoken utterance is received 710 from a user. A beliefstate is determined 720 based on the spoken utterance. In oneembodiment, belief state features are determined in place of or inaddition to the belief state. An action is selected 730 based on thedetermined belief state and the retrieved policy. If the selected actionrequires 740 additional feedback from the user, additional feedback isrequested 750 and the process is repeated. If the action does notrequire additional feedback from the user, the action is performed 760.

FIG. 8 is a flowchart illustrating the creation of a policy inaccordance to one embodiment. Initial policy parameters are created 800.In one embodiment, the initial policy parameters are manually selected,selected based on previously created policies, or randomly selected. Asimulated user input is created 810, the simulated user input associatedwith a pre-determined goal. The belief state is determined 820 based onthe simulated user input. In one embodiment, the simulated user input isintentionally obscured by noise in order to best simulate theenvironment in which an actual user utterance is received and whereother audio noise may be present.

An action is performed 830 based on the determined belief state andbased on the policy parameters. If the goal is not achieved 840 by theperformance of the action, the simulated user input is modified 850, thebelief state is re-determined 820, and an action is re-performed 830. Ifthe goal is achieved 840 by the performance of the action, a POMDPreward score is determined 860 for the iterative user input modificationand action performance process.

If after determining the POMDP reward score, it is determined that apolicy threshold is not satisfied 870, the policy parameters aremodified 880 based on the determined belief states, the performedactions, the number of actions performed, and the determined rewardscore. Examples of policy thresholds include a pre-determined number ofactions being performed, a convergence onto a pre-determined rewardscore threshold by determined POMDP scores, and a policy parametermodification convergence. If after determining the POMDP reward score,it is determined that a policy threshold is satisfied 870, a policy iscreated 890 based on the final policy parameters. The policy may bestored for later use by a spoken dialog system.

Additional Considerations

Reference in the specification to “one embodiment” or to “an embodiment”means that a particular feature, structure, or characteristic describedin connection with the embodiments is included in at least oneembodiment. The appearances of the phrase “in one embodiment” or “anembodiment” in various places in the specification are not necessarilyall referring to the same embodiment.

Some portions of the detailed description that follows are presented interms of algorithms and symbolic representations of operations on databits within a computer memory. These algorithmic descriptions andrepresentations are the means used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. An algorithm is here, and generally,conceived to be a self-consistent sequence of steps (instructions)leading to a desired result. The steps are those requiring physicalmanipulations of physical quantities. Usually, though not necessarily,these quantities take the form of electrical, magnetic or opticalsignals capable of being stored, transferred, combined, compared andotherwise manipulated. It is convenient at times, principally forreasons of common usage, to refer to these signals as bits, values,elements, symbols, characters, terms, numbers, or the like. Furthermore,it is also convenient at times, to refer to certain arrangements ofsteps requiring physical manipulations or transformation of physicalquantities or representations of physical quantities as modules or codedevices, without loss of generality.

However, all of these and similar terms are to be associated with theappropriate physical quantities and are merely convenient labels appliedto these quantities. Unless specifically stated otherwise as apparentfrom the following discussion, it is appreciated that throughout thedescription, discussions utilizing terms such as “processing” or“computing” or “calculating” or “determining” or “displaying” or“determining” or the like, refer to the action and processes of acomputer system, or similar electronic computing device (such as aspecific computing machine), that manipulates and transforms datarepresented as physical (electronic) quantities within the computersystem memories or registers or other such information storage,transmission or display devices.

Certain aspects include process steps and instructions described hereinin the form of an algorithm. It should be noted that the process stepsand instructions could be embodied in software, firmware or hardware,and when embodied in software, could be downloaded to reside on and beoperated from different platforms used by a variety of operatingsystems. The embodiment can also be in a computer program product whichcan be executed on a computing system.

The exemplary embodiments also relate to an apparatus for performing theoperations herein. This apparatus may be specially constructed for thepurposes, e.g., a specific computer in a vehicle, or it may comprise ageneral-purpose computer selectively activated or reconfigured by acomputer program stored in the computer which can be in a vehicle. Sucha computer program may be stored in a computer readable storage medium,such as, but is not limited to, any type of disk including floppy disks,optical disks, CD-ROMs, magnetic-optical disks, read-only memories(ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic oroptical cards, application specific integrated circuits (ASICs), or anytype of media suitable for storing electronic instructions, and eachcoupled to a computer system bus. Memory can include any of the aboveand/or other devices that can store information/data/programs.Furthermore, the computers referred to in the specification may includea single processor or may be architectures employing multiple processordesigns for increased computing capability.

The algorithms and displays presented herein are not inherently relatedto any particular computer or other apparatus. Various general-purposesystems may also be used with programs in accordance with the teachingsherein, or it may prove convenient to construct more specializedapparatus to perform the method steps. The structure for a variety ofthese systems will appear from the description below. In addition, theexemplary embodiments are not described with reference to any particularprogramming language. It will be appreciated that a variety ofprogramming languages may be used to implement the teachings asdescribed herein, and any references below to specific languages areprovided for disclosure of enablement and best mode.

In addition, the language used in the specification has been principallyselected for readability and instructional purposes, and may not havebeen selected to delineate or circumscribe the inventive subject matter.Accordingly, the disclosure is intended to be illustrative, but notlimiting, of the scope of the embodiments.

While particular embodiments and applications have been illustrated anddescribed herein, it is to be understood that the embodiment is notlimited to the precise construction and components disclosed herein andthat various modifications, changes, and variations may be made in thearrangement, operation, and details of the methods and apparatuseswithout departing from the spirit and scope.

What is claimed is:
 1. A computer-based method of performing an actionin a spoken dialog system comprising: retrieving a policy mapping beliefstates of user intents to actions; receiving a spoken utterance from auser; determining a belief state of the user's intent based on thespoken utterance using a probabilistic ontology tree (POT) comprising aplurality of unobserved nodes each representing a domain concept andcomprising a probability distribution table including probabilities thatthe concept represented by the unobserved node takes each of a pluralityof values, wherein each unobserved node is associated with a dialog slotto which the spoken utterance is assigned based on the relatednessbetween the spoken utterance and the domain concept represented by theunobserved node; selecting an action to take based on the determinedbelief state and the retrieved policy; and performing the selectedaction.
 2. The method of claim 1, wherein the selected action comprisesan action requiring additional feedback from the user.
 3. The method ofclaim 2, wherein the selected action comprises a request forclarification of all or part of the spoken utterance from the user. 4.The method of claim 2, wherein the selected action comprises a requestfor confirmation of all or part of the spoken utterance from the user.5. The method of claim 1, wherein the selected action comprisesinforming the user of information requested by the user.
 6. The methodof claim 1, wherein retrieving a policy comprises creating a policy. 7.The method of claim 6, wherein creating a policy comprises: creatingpolicy parameters; creating a simulated user input with a pre-determinedgoal; determining a simulated belief state of user intent based on thecreated simulated user input; performing an action based on thesimulated belief state and the policy parameters; and outputting thepolicy parameters if the performed action satisfies the pre-determinedgoal.
 8. The method of claim 7, further comprising: modifying thesimulated user input based on the performed action if the performedaction does not satisfy the pre-determined goal; determining a secondsimulated belief state of user intent based on the modified simulateduser input; performing a second action based on the second simulatedbelief state and the policy parameters; and outputting the policyparameters if the performed second action satisfies the pre-determinedgoal.
 9. The method of claim 8, further comprising: determining apartially observable Markov decision process (POMDP) reward score basedon the performed actions; and responsive to a determination that apolicy threshold is satisfied based on the POMDP reward score,outputting the policy parameters.
 10. The method of claim 9, furthercomprising: responsive to a determination that a policy threshold is notsatisfied based on the POMDP reward score, modifying the policyparameters based on one or more of: the simulated belief state, thesecond simulated belief state, the performed action, the secondperformed action, and the POMDP reward score; creating a secondsimulated user input with a pre-determined goal; determining a thirdsimulated belief state of user intent based on the second simulated userinput; and performing a third action based on the third simulated beliefstate and the modified policy parameters.
 11. The method of claim 1,wherein the determined belief state comprises belief features of theuser's intent.
 12. The method of claim 1, wherein determining a beliefstate of the user's intent based on the spoken utterance comprises:creating an observed node in the POT for each dialog slot associatedwith an unobserved node to which the spoken utterance is assigned, eachobserved node comprising a probability distribution table includingprobabilities that the intended node value of the unobserved node iscorrectly and incorrectly identified; and determining a belief state ofuser intent based on the joint probability of the unobserved nodeprobability distribution tables and the observed node probabilitydistribution tables.
 13. The method of claim 12, wherein the spokenutterance is converted into text using automatic speech recognition, andwherein the spoken utterance is assigned to one or more dialog slots bydetermining a confidence score for each assignment based on thelikelihood that the text was correctly assigned to a dialog slot. 14.The method of claim 13, wherein the probability distribution tables ofthe observed nodes are determined based on the confidence score and thedomain concept associated with the observed nodes.
 15. The method ofclaim 12, wherein the joint probability of the unobserved nodeprobability distribution tables and the observed node probabilitydistribution tables is determined using a message-passing algorithm inwhich child node passes a message to the child node's parent node, andwherein each message comprises the joint probability distributions ofthe child node and all descendant nodes of the child node.
 16. A spokendialog system for performing an action comprising: a policy module forretrieving a policy mapping belief states of user intents to actions; amicrophone for receiving a spoken utterance from a user; a belieftracking module for determining a belief state of the user's intentbased on the spoken utterance using a probabilistic ontology tree (POT)comprising a plurality of unobserved nodes each representing a domainconcept and comprising a probability distribution table includingprobabilities that the concept represented by the unobserved node takeseach of a plurality of values, wherein each unobserved node isassociated with a dialog slot to which the spoken utterance is assignedbased on the relatedness between the spoken utterance and the domainconcept represented by the unobserved node; a policy interface forselecting an action to take based on the determined belief state and theretrieved policy; and a system interface for performing the selectedaction.
 17. A non-transitory computer-readable storage medium havingcomputer-executable code for performing an action comprising: a policymodule configured to retrieve a policy mapping belief states of userintents to actions; a microphone module configured to receive a spokenutterance from a user; a belief tracking module configured to determinea belief state of the user's intent based on the spoken utterance usinga probabilistic ontology tree (POT) comprising a plurality of unobservednodes each representing a domain concept and comprising a probabilitydistribution table including probabilities that the concept representedby the unobserved node takes each of a plurality of values, wherein eachunobserved node is associated with a dialog slot to which the spokenutterance is assigned based on the relatedness between the spokenutterance and the domain concept represented by the unobserved node; apolicy interface module configured to select an action to take based onthe determined belief state and the retrieved policy; and a systeminterface module configured to perform the selected action.
 18. Acomputer-based method for determining a belief state of a user's intentcomprising: receiving a spoken utterance from the user; converting thespoken utterance into text; retrieving a probabilistic ontology tree(POT), the POT comprising a plurality of unobserved nodes, eachunobserved node representing a domain concept and comprising aprobability distribution table including probabilities that the conceptrepresented by the unobserved node takes each of a plurality of values,wherein each unobserved node is associated with a dialog slot; assigningthe text to one or more dialog slots associated with unobserved nodesbased on the relatedness between the text and the domain conceptsrepresented by the unobserved nodes; creating an observed node in thePOT for each dialog slot to which text is assigned, the observed nodescomprising a probability distribution table including probabilities thatthe intended node value of the unobserved node is correctly andincorrectly identified; and determining a belief state of user intentbased on the joint probability of the unobserved node probabilitydistribution tables and the observed node probability distributiontables.